Statement#

Given two strings, str1 and str2, find the minimum edit distance required to convert str1 into str2. Minimum edit distance is the minimum number of insertions, deletions, or substitutions required to transform str1 into str2.

Note: This problem has a direct application in the autocorrect feature. It is also used in bioinformatics to quantify the similarity between two DNA sequences.

The visualization below shows all these operations. Each operation has a cost of one unit. Therefore, you want to find the minimum cost of converting str1 into str2. The following visualization also shows how different sequences of operations can entail different costs.

Created with Fabric.js 3.6.6 t e h t h e Happens a lot right? You try to type "the" but due to one hot finger you end up typing "teh".

1 of 10

Created with Fabric.js 3.6.6 t h h t h e Now you can either add an "h" after "t", or you can change "e" to "h", let's do that later.

2 of 10

Created with Fabric.js 3.6.6 t h h t h e Now again, we can either change "h" to "e" or remove the "h" instead. Let's, for sake of the example, remove "h".

3 of 10

Created with Fabric.js 3.6.6 t h h t h e Now again, we can either change "h" to "e" or remove the "h" instead. Let's, for sake of the example, remove "h".

4 of 10

Created with Fabric.js 3.6.6 t h t h e Now we only need to insert an "e".

5 of 10

Created with Fabric.js 3.6.6 t h e t h e Now we only need to insert an "e".

6 of 10

Created with Fabric.js 3.6.6 t h e t h e Now we only need to insert an "e".

7 of 10

Created with Fabric.js 3.6.6 t e h t h e t h e t h e Here we have converted "teh" into "the".

8 of 10

Created with Fabric.js 3.6.6 t e h t h e t e h _ t h _ e This conversion required 1 change (grey), 1 deletion (red) and 1 insertion (purple). We can represent this with the above alignment. The total cost here will be 3.

9 of 10

Created with Fabric.js 3.6.6 t e h t h e t e h t h e Of course we could have converted with lower cost i.e. by two changes (cost=2).

10 of 10

The alignment depiction in the above visualization is a hint toward the solution. You basically need to find an alignment between two strings that has the least cost. When characters match, there is no cost, but there is a cost of one unit when characters do not match (insertion, deletion, or substitution), or when a character is skipped (insertion or removal in str1) in either of the strings.

Constraints:

  • 11 \leq str1.length, str2.length 500\leq 500
  • str1 and str2 are in the same case English letters

Examples#

No.

str1

str2

Edit Distance

1

sunday

saturday

3

2

sam

sum

1

Try it yourself#

Implement your solution in the following playground.

Python
usercode > main.py
Input #1
Input #2
Edit Distance

Note: If you clicked the “Submit” button and the code timed out, this means that your solution needs to be optimized in terms of running time.

Hint: Use dynamic programming and see the magic.

Solution#

We will first explore the naive recursive solution to this problem and then see how it can be improved using the Longest Common Substring dynamic programming pattern.

Naive approach#

A naive approach is to process all the characters of each string, one by one. Starting from the last character of both strings, there are two possibilities for every pair of characters being considered:

  • The last characters of both the strings match, in which case, the lengths of both the strings are reduced by one, and the function is called recursively on the remaining strings.

  • The last characters of both the strings do not match, in which case, all three operations (insertion, deletion, and substitution) are carried out on the last character of the first string.

  • The minimum cost for all three operations is computed recursively and returned.

Let’s implement the algorithm as discussed above:

Edit Distance using recursion

Note: Please observe that if you include the test case commented out in the driver code, the solution is likely to time out. Try to solve the larger problem using the dynamic programming solutions provided below and see the difference.

Time complexity#

The time complexity of the naive approach is exponential, i.e., O(3n)O(3^n), where nn is the length of the longest string.

Space complexity#

As the maximum depth of the recursive call tree is nn, and each call stores its data on the stack, the space complexity of this approach is O(n)O(n), where nn is the length of the longest string.

Optimized Solution using dynamic programming#

Now, let’s improve the recursive solution using top-down and bottom-up approaches.

Top-down solution#

The top-down solution, commonly known as the memoization technique, is an enhancement of the recursive solution. It overcomes the problem of calculating redundant solutions over and over again by storing them in an array. In the naive approach, it computes the minimum edit distance of the same substrings many times. Therefore, we need to memoize by using a 2-D array, lookup_table. This array is initialized using the lengths of both strings in the min_edit_dist function.

The basic algorithm works exactly the same as the one in the naive approach. The only difference is that for each character, the minimum edit distance value is stored in the lookup_table.

  • If the end characters match, the value returned is stored at lookup_table[m-1][n-1]

  • If the end characters don’t match, all three operations (insertion, deletion, and substitution) are carried out on the last character of the first string. The minimum cost for all three operations is then computed and stored in the lookup_table[m-1][n-1]

Realize that since the values returned are being stored in the lookup_table, the next time the function is called with the same arguments, the recursive call won’t be made, and instead the value will be directly returned from the lookup_table.

Let’s implement the algorithm as discussed above:

Edit Distance using memoization

Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.

Time complexity#

As we avoided redundant calculations by storing all the intermediate results in a 2-D array, the time complexity of this approach is reduced to O(n2)O(n ^2), where nn is the length of the longest string.

Space complexity#

We now need O(n2)O(n ^2) space in memory to store intermediate values.

Bottom-up solution#

The bottom-up solution, also known as the tabulation technique, is an iterative method of solving dynamic programming problems. In this solution, we will again make use of a 2-D array, lookup_table, that stores the minimum edit distance required to convert the first string into the second string for each character.

In this case, we build the solution bottom-up by dividing our problem into subproblems.

  • We know that if the first string is empty, we perform the insert operation on all characters of the second string to make both strings similar. Therefore, the minimum number of operations will equal j. Similarly, if the second string is empty, the delete operation is performed on all elements of the first string to make it similar to the second string. So, the minimum number of operations will equal i. Since this is known, we fill the lookup_table accordingly.

  • If the last characters are the same, we store the returned values from lookup_table[i-1][j-1] in lookup_table[i][j].

  • For characters that don’t match, the algorithm works by performing all three operations on the last character of the first string and calculating the minimum edit distance required to convert the first string into the second string. This value is then stored in the lookup_table to avoid recalculations.

Let’s look at the following illustration to get a better understanding of the solution:

Created with Fabric.js 3.6.6 Let's calculate the edit distance between str1 = "azc" and str2="abcde" i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 2 b 2 3 c 3 4 d 4 5 e 5

1 of 22

Created with Fabric.js 3.6.6 Each position (i,j) represents the edit distance between str1[i] and str2[j] i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 2 b 2 3 c 3 4 d 4 5 e 5

2 of 22

Created with Fabric.js 3.6.6 i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 2 b 2 3 c 3 4 d 4 5 e 5 So lookup_table[3][1] will represent the edit distance between "abc" and "a"

3 of 22

Created with Fabric.js 3.6.6 The rows that are currently reddish represent the edit distance between each string and an empty string i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 2 b 2 3 c 3 4 d 4 5 e 5

4 of 22

Created with Fabric.js 3.6.6 To calculate the value at lookup_table[m][n], we add the minimum of lookup_table[m-1][n], lookup_table[m-1][n-1], lookup_table[m][n-1] to 1 if the characters str1[m] != str2[n] and 0 otherwise i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 2 b 2 3 c 3 4 d 4 5 e 5

5 of 22

Created with Fabric.js 3.6.6 Let's try for lookup_table[1][1] i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 x 2 b 2 3 c 3 4 d 4 5 e 5

6 of 22

Created with Fabric.js 3.6.6 lookup_table[1][1] = 0 because 'a' == 'a' and min(0,1,1) is 0 i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 2 b 2 3 c 3 4 d 4 5 e 5

7 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 2 b 2 1 3 c 3 4 d 4 5 e 5

8 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 2 b 2 1 3 c 3 2 4 d 4 5 e 5

9 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 2 b 2 1 3 c 3 2 4 d 4 3 5 e 5

10 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 2 b 2 1 3 c 3 2 4 d 4 3 5 e 5 4

11 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 b 2 1 3 c 3 2 4 d 4 3 5 e 5 4

12 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 b 2 1 1 3 c 3 2 4 d 4 3 5 e 5 4

13 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 b 2 1 1 3 c 3 2 2 4 d 4 3 5 e 5 4

14 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 b 2 1 1 3 c 3 2 2 4 d 4 3 3 5 e 5 4

15 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 b 2 1 1 3 c 3 2 2 4 d 4 3 3 5 e 5 4 4

16 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 2 b 2 1 1 3 c 3 2 2 4 d 4 3 3 5 e 5 4 4

17 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 2 b 2 1 1 2 3 c 3 2 2 4 d 4 3 3 5 e 5 4 4

18 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 2 b 2 1 1 2 3 c 3 2 2 1 4 d 4 3 3 5 e 5 4 4

19 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 2 b 2 1 1 2 3 c 3 2 2 1 4 d 4 3 3 2 5 e 5 4 4

20 of 22

Created with Fabric.js 3.6.6 Let's fill the rest of the table in a similar fashion i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 2 b 2 1 1 2 3 c 3 2 2 1 4 d 4 3 3 2 5 e 5 4 4 3

21 of 22

Created with Fabric.js 3.6.6 i 0 1 2 3 j a z c 0 0 1 2 3 1 a 1 0 1 2 2 b 2 1 1 2 3 c 3 2 2 1 4 d 4 3 3 2 5 e 5 4 4 3 And the answer is 3

22 of 22

Let’s implement the algorithm as discussed above:

Edit Distance using tabulation

Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.

Time complexity#

As we created a 2-D array to store the results of sub-problems that would be used later on, the time complexity of this approach will also be O(n2)O(n^2).

Space complexity#

We need O(n2)O(n^2) space in memory to store the intermediate values.

Minimum Number of Deletions and Insertions

Longest Repeating Subsequence